Encoding audio signals

ABSTRACT

A system for transmitting audio signals over a telecommunications link generates the signals as two or more alternative feeds, for example at different data rates. The two feeds are encoded using coding methods having a frame structure with different frame lengths. To facilitate switching between the two, the input signal is notionally divided into temporal portions and each is coded by taking it, plus enough of the next (or preceding) portion to make up a whole number of frames, and encoding it, whereby the encoded portions overlap—at least for one of the feeds. The overlap is lost upon decoding by discarding duplicate material.

[0001] The present invention is concerned with the delivery, over atelecommunications link, of digitally coded material for presentation toa user.

[0002] According to one aspect of the invention there is provided amethod of encoding audio signals comprising

[0003] notionally dividing an input signal into successive temporalportions;

[0004] encoding said input temporal portions using a first encodingalgorithm having a first frame length to produce a first encodedsequence of encoded temporal portions;

[0005] encoding said input temporal portions using a second frame lengthto produce a second sequence of encoded temporal portions;

[0006] wherein at least one of the encoding steps comprises encoding oneinput temporal portion along with so much of the end of the precedingtemporal portion and/or the beginning of the immediately followingtemporal portion as to constitute with said one temporal portion anintegral number of frames.

[0007] In another aspect, the invention provides a method of encodinginput audio signals comprising: encoding with a first coding algorithmhaving a first frame length each of successive first temporal portionsof the input signal, which portions correspond to an integral number ofsaid first frame lengths and either are contiguous or overlap, toproduce a first encoded sequence; encoding with a second codingalgorithm having a second frame length each of successive secondtemporal portions of the input signal, which portions correspond to anintegral number of said second frame lengths and do not correspond to anintegral number of said first frame lengths and which overlap, toproduce a second encoded sequence such that each overlap region of thesecond encoded sequence encompasses at least partially a boundarybetween, or, as the case may be, overlap region portions of, the firstencoded sequence which correspond to successive temporal portions of theinput signal.

[0008] Other, optional, aspects of the invention are set out in thesub-claims.

[0009] Note that the following description and drawings is identical tothat contained in our co-pending international patent applicationentitled “Delivery of Audio and or Video Material” (Applicant's refA26097) filed on the same day as the present application, claimingpriority from GB 00 30706.6.

[0010] Some embodiments of the present invention will now be described,with reference to the accompanying drawings, in which:

[0011]FIG. 1 is a diagram illustrating the overall architecture of thesystems to be described;

[0012]FIG. 2 is a block diagram of a terminal for use in such a system;

[0013]FIG. 3 shows the contents of a typical index file;

[0014]FIG. 4 is a timing diagram illustrating a modified method ofsub-file generation; and

[0015]FIG. 5 is a diagram illustrating a modified architecture.

[0016] The system shown in FIG. 1 has as its object. the delivery, to auser, of digitally coded audio signals (for example, of recorded musicor speech) via a telecommunications network to a user terminal where thecorresponding sounds are to be played to the user. However, as will bediscussed in more detail below, the system may be used to convey videosignals instead of, or in addition to, audio signals. In this example,the network is the intemet or other packet network operating inaccordance with the Hypertext Transfer Protocol (see RFCs 1945/2068 fordetails), though in principle other digital links or networks can beused. It is also assumed that the audio signals have been recorded incompressed form using the ISO MPEG-1 Layer III standard (the “MP3standard”); however it is not essential to use this particular format.Nor, indeed, is it necessary that compression be used, though naturallyit is highly desirable, especially if the available bit-rate isrestricted or storage space is limited. In FIG. 1, a server 1 isconnected via the internet 2 to user terminals 3, only one of which isshown. The function of the server 1 is to store data files, to receivefrom a user terminal a request for delivery of a desired data file and,in response to such a request, to transmit the file to the user terminalvia the network. Usually such a request takes the form of first partindicating the network delivery mechanism (e.g. http:// or file:// forthe hypertext transfer protocol or file transfer protocol respectively)followed by the network address of the server (e.g. www.server 1.com)suffixed with the name of the file that is being requested. Note that,in the examples given, such names are, for typographical reasons, shownwith the “//” replaced by “\\”.

[0017] In these examples, the use of the hypertext transfer protocol isassumed; this is not essential, but is beneficial in allowing use of theauthentication and security features (such as the Secure Sockets Layer)provided by that protocol.

[0018] Conventionally, a server for delivery of MP3 files takes the formof a so-called streamer which includes processing arrangements for thedynamic control of the rate at which data are transmitted depending onthe replay requirements at the user terminal, for the masking of errorsdue to packet loss and, if user interaction is allowed, the control ofthe flow of data between server and client; here however the server 1contains no such provision. Thus it is merely an ordinary “web server”.

[0019] The manner in which the data files are stored on the server 1will now be explained. Suppose that an MP3-format file has been createdand is to be stored on the server. Suppose that it is a recording of J.S. Bach's Toccata and Fugue in D minor (BWV565) which typically has aplaying time of 9 minutes. Originally this would have been created as asingle data file, and on a conventional streamer would be stored as thisone single file. Here, however. the file is divided into smaller filesbefore being stored on the server 1. We prefer that each of thesesmaller files is of a size corresponding to a fixed playing time,perhaps four seconds. With a compressed format such as MP3 this may meanthat the files will be of different sizes in terms of the number of bitsthey actually contain. Thus the Bach file of 9 minutes duration would bedivided into 135 smaller files each representing four seconds' playingtime. In this example these are given file names which include a serialnumber indicative of their sequence in the original file, for example:

[0020] 000000.bin

[0021] 000001.bin

[0022] 000002.bin

[0023] 000003.bin

[0024] . . .

[0025] . . .

[0026] 000134.bin

[0027] The partitioning of the file into these smaller sub-files maytypically be performed by the person preparing the file for loading ontothe web server 1. (The expression “sub-files” is used here todistinguish them from the original file containing the whole recording:it should however be emphasised that, as far as the server is concernedeach “sub-file” is just a file like any other file). The precise mannerof their creation will be described more fully below. Once created,these sub-files are uploaded onto the server in a conventional mannerjust like any other file being loaded onto a web server. Of course thefilename could also contain characters identifying the particularrecording (the sub-file could also be “tagged” with additionalinformation—when you play an MP3 file you get information on the author,copyright etc), but in this example the sub-files are stored on theserver in a directory or folder specific to the particularrecording—e.g. mp3_bwv565. Thus a sub-file, when required, may berequested in the form:

[0028] http:\\www.server1.com/mp3_bwv565/000003.bin

[0029] where “www.server1.com” is the URL of the server 1.

[0030] It is also convenient for the person preparing the sub-files forloading onto the server to create, for each recording, a link page(typically in html format) which is also stored on the server (perhapswith filename mp3_bwv565/link.htm), the structure and purpose of whichwill be described later.

[0031] It is also convenient that the web server stores one or more(html) menu pages (e.g. menu.htm) containing a list of recordingsavailable, with hyperlinks to the corresponding link pages.

[0032] Turning now to the terminal, this may typically take the form ofa conventional desktop computer, with, however, additional software forhandling the reception of the audio files discussed. If desired, theterminal could take the form of a handheld computer, or even beincorporated into a mobile telephone. Thus FIG. 2 shows such a terminalwith a central processor 30, memory 31, a disk store 32, a keyboard 33video display 34, communications interface 35, and audio interface(“sound card”) 36. For video delivery, a video card would be fitted inplace of, or in addition to, the card 36. In the disk store are programswhich may be retrieved into the memory 31 for execution by the processor30, in the usual manner. These programs include a communications program37 for call-up and display of html pages—that is, a “web browser”program such as Netscape Navigator or Microsoft Explorer, and a furtherprogram 38 which will be referred to here as “the player program” whichprovides the functionality necessary for the playing of audio files inaccordance with this embodiment of the invention. Also shown is a region39 of the memory 31 which is allocated as a buffer. This is a decodedaudio buffer containing data waiting to be played (typically the playouttime of the buffer might be 10 seconds). The audio interface or soundcard 36 can be a conventional card and simply serves to receive PCMaudio and convert it into an analogue audio signal, e.g. for playingthrough a loudspeaker. Firstly, we will give a brief overview of theoperation of the terminal for the retrieval and playing of the desiredrecording when using the HTTP and an embedded or “plugin” client

[0033] 1. The user uses the browser to retrieve and display the menupage menu.htm from the server 1.

[0034] 2. The user selects one of the hyperlinks within the menu pagewhich causes the browser to retrieve from the server, and display, thelink page for the desired recording—in this example the filemp3_bwv565_link.htm. The actual display of this page is unimportant(except that it may perhaps contain a message to reassure the user thatthe system is working correctly). What is important about this page isthat it contains a command (or “embed tag”) to invoke in the processor30 a secondary process in which the player program 37 is executed. Theinvocation of a secondary process in this manner is well-knoxvn practice(such a process is known in Netscape systems as a “plug-in” and inMicrosoft systems as “ActiveX”). Such commands can also containsparameters to be passed to the secondary process and in the system ofFIG. 1 the command contains the server URL of the recording, which, forthe Bach piece, would be http:\\www.server1/mp3_bwv565.

[0035] 3. The player program 37 includes an MP3 decoder, the operationof which is, in itself, conventional. Of more interest in the presentcontext are the control functions of the program which are as follows.

[0036] 4. The player program, having received the URL, adds to this thefilename of the first sub-file, to produce a complete address for thesub-file—i.e. www.server1/mp3_bwv565/000000.bin. It will be observedthat this system is organised on the basis that the sub-files are namedin the manner indicated above, so that the terminal does not need to beinformed of the filenames. The program constructs a request message forthe file having this URL and transmits it to the server 1 via thecommunications interface 35 and the internet 2. (Processes fortranslating the URL into an IP address and for error reporting ofinvalid, incomplete or unavailable URLs are conventional and will nottherefore be described). We envisage that the player program would sendthe requests directly to the communications interface, rather than viathe browser. The server responds by transmitting the required sub-file.

[0037] 5. The player program determines from the file the audio decodingused in this sub-file and decodes the file back to raw PCM values inaccordance with the relevant standard (MP3 in this example), making anote of the play time of this sub-file. Generally an audio file containsan identifier at the beginning of the file which states the encodingused. The decoded audio data is then stored in the audio buffer 38.

[0038] 6. The player program has a parameter called the playout timeT_(p). In this example it is set at 10 seconds (it could be madeuser-selectable, if desired). It determines the degree of buffering thatthe terminal performs.

[0039] 7. The player program increments the filename to 000001.bin andrequests, receives, decodes and stores this second sub-file as describedin (4) and (5) above. It repeats this process until the contents of thebuffer reach or exceed the playout time T_(p). Note that it is notactually essential that the decoding occurs before the buffer but itsimplifies matters as since the audio is decoded back to raw PCM thenthe duration of the buffered material is then explicitly known. Itsimplifies the control of the audio buffer if each of the sub-files isthe same audio playback size.

[0040] 8. Having reached the playout threshold T_(p) the decoded dataare sent from the buffer to the audio interface 36 which plays the soundthrough a loudspeaker (not shown).

[0041] 9. Whilst playing the sounds as in (8) above, the player programcontinually monitors the state of buffer fullness and whenever thisfalls below T_(p) it increments the filename again and obtains a furthersub-file from the server. This process is repeated until a “file notfound error” is returned.

[0042] 10. If, during this process, the buffer becomes empty, the playerprogram simply ceases playing until further data arrives.

[0043] The sub-file naming convention used here, of a simple fixedlength sequence of numbers starting with zero, is preferred as it issimple to implement, but any naming convention can be used provided theplayer program either contains (or is sent) the name of the firstsub-file and an algorithm enabling it to calculate succeeding ones, oralternatively is sent a list of the filenames.

[0044] It will have been observed that the system described above offersthe user no opportunity to intervene in the replay process. Nor does itoffer any remedy for the possibility of buffer underflow (due forexample to network congestion). Therefore a second, more sophisticatedembodiment of the invention, now to be described, offers the followingfurther features:

[0045] a) the server stores two or more versions of the recording,recorded at different compression rates (for example at compressionscorresponding to (continuous) data rates of 8, 16, 24 and 32 kbit/srespectively) and the player program is able to switch automaticallybetween them.

[0046] b) the player program displays to the user a control panelwhereby the user may start the playing, pause it, restart it (from thebeginning, or from the point at which it paused), or jump to a differentpoint in the recording (back or forward).

[0047] Note that these features are not interdependent, in that usercontrol could be provided without rate-switching, or vice versa.

[0048] In order to provide for rate switching, the person preparing thefile for loading onto the server prepares several source files—byencoding the same PCM file several times at different rates. He thenpartitions each source file into sub-files, as before. These can beloaded onto the server in separate directories corresponding to thedifferent rate, as in the following example structure, where “008k”,“024k” in the directory name indicates a rate of 8 kbit/s or 24 kbit/sand so on.

[0049] He also creates an index file (e.g. index.htm) the primarypurpose of which is to provide a list of the data rates that areavailable. Directory Subdirectory Filename mp3_bwv565 none link.htmindex.htm mp3_bwv565 008k_11_m 000000.bin 000001.bin 000002.bin000003.bin . . . 000134.bin mp3_bwv565 016k_11_m 000000.bin 000001.bin000002.bin 000003.bin . . . 000134.bin mp3_bwv565 018k_11_s 000000.bin000001.bin 000002.bin 000003.bin . . . 000134.bin mp3_bwv565 024k_11_s000000.bin 000001.bin 000002.bin 000003.bin . . . 000134.bin mp3_bwv565032k_11_s 000000.bin 000001.bin 000002.bin 000003.bin . . . 000134.bin

[0050] The index file would thus contain a statement of the form:

<!Audio=“024k_(—)11_s 032k_(—)11_s 018k_(—)11_s 016k_(—)11_m008_(—)11_m”-->

[0051] (The <!--...-->simply indicates that the statement is embedded asa comment in an html file (or a simple text file could be used)). Atypical index file is shown in FIG. 3 where other information isincluded: LFI is the highest sub-file number (i.e. there are 45sub-files) and SL is the total playing time (178 seconds). “Mode”indicates “recorded” (as here) or “live” (to be discussed below). Theother entries are either self-explanatory, or standard html commands.

[0052] Initially the player program will begin by requesting, from thedirectory specified in the link file, the index file, and stores locallya list of available data rates for future reference. (It may explicitlyrequest this file or just specify the directory: most servers default toindex.htm if a filename is not specified.) It then begins to request theaudio sub-files as described earlier, from the first-mentioned “rate”directory in the index file—viz. 024k_(—)11_s (or the terminal couldoverride this by modifying this to a default rate set locally for thatterminal). The process from then on is that the player program measuresthe actual data rate being received from the server, averaged over aperiod of time (for example 30 seconds). It does this by timing everyURL request; the transfer rate achieved (number of bits per second)between the client and server is determined. The accuracy of this figureimproves as the number of requests goes up. The player maintains twostored parameters which indicate, respectively, the current rate, andthe measured rate.

[0053] The initiation of a rate change is triggered:

[0054] a) if the buffer ever empties AND the measured rate is less thanthe current rate AND the measured Buffer Low Percentage exceeds a StepDown Threshold (as described below), reduce the current rate; (changingat a time when the buffer is already empty is advantageous as the soundcard is not playing anything and it may be necessary to reconfigure itif the audio sampling rate, stereo-mono setting or bit width (number ofbits per sample) has changed).

[0055] b) if the measured rate exceeds not only the current rate butalso the next higher rate for a given period of time (e.g. 120 seconds:this could if desired be made adjustable by the user) increase thecurrent rate

[0056] The Buffer Low Percentage is the percentage of the time that thebuffer contents represent less than 25% of the playout time (i.e. thebuffer is getting close to being empty). If the Step Down Threshold isset to 0% then when the buffer empties the system always steps down whenthe other conditions are satisfied. Setting the Step Down Threshold to5% (this is our preferred default value) means that if the bufferempties but the measured Buffer Low Percentage is greater than 5% itwill not step down. Further buffer empties will obviously cause thismeasured rate to increase and will eventually empty the buffer againwith a Buffer Low Percentage value exceeding 5% if the rate can not besustained. Setting the value to 100% means the client will never stepdown.

[0057] The actual rate change is effected simply by the player programchanging the relevant part of the sub-file address for example, changing“008k” to “024k” to increase the data rate from 8 to 24 kbit/s, andchanging the current rate parameter to match. As a result, the nextrequest to the server becomes a request for the higher (or lower) rate,and the sub-file from the new directory is received, decoded and enteredinto the buffer. The process just described is summarised in thefollowing flowchart: User Terminal Server Select Menu page Requesthttp:\\server1.com/menu.htm Send http:\\server1.com/menu.htm Displaymenu.htm Select item from Menu (Bach) Extract hyperlink URL frommenu.htm (mp3__bwv565/link.htm) Requesthttp:\\server1.com/mp3_bwv565/link.htm Send http:\\server1.com/mp3_bwv565/link.htm Display link.htm Execute secondary process (playerprogram) specified in link.htm with parameters specified in link.htm(http:\\server1/mp3_bwv565) Set Stem to that specified Set URL = Stem +“index.htm” Request this URL Send requested file Set Rate List to ratesspecified in index.htm Set LFI to value specified in index.htm Set StemC= Stem + “/” + RateList(item 1) Set CurrentRate = rate specified inRateList (item 1) Set RateU = next higher rate in list or zero if noneSet StemU = Stem + “/” + item in rate list corresponding to this rate;Set RateD = next lower rate in list or zero if none Set StemD = Stem +“/” + item in rate list corresponding to this rate; Set Current Subfile= 000000.bin J1: Set URL = StemC + Current Subfile Request this URL Ifrequested subfile exists, Send requested subfile; otherwise send errormessage If error message received, Stop Decode received subfile Write tobuffer JIA: If Buffer Fullness > Tp seconds go to Step J3 J2: IncrementCurrent Subfile Go to Step J1 J3: Begin/continue playing of buffercontents via sound card J4: If Buffer Fullness < Tp seconds go to StepJ2 If BufferFullness = 0 AND Mrate < CurrentRate AND BufferLow% > Td goto Stepdown If MRate > NextRate AND NextRate <> 0 goto Stepup Input ofuser If UserCommand = Pause then: commands Stop reading from buffer;Loop until Usercommand = Resume; go to J3 If UserCommand = Jump(j%)then: Clear buffer;  Set CurrentSubfile =  Integer[(LFI + 1) * j / 100];  goto Step J1 Go to Step J1A Stepup: Clear Buffer Set RateD = RateC SetStemD = StemC Set RateC = RateU Set StemC = StemU Set RateU = nexthigher rate in list or zero if none Set StemU = Stem + “/” + item inrate list corresponding to this rate Go to Step J1A Stepdown: ClearBuffer Set RateU = RateC Set StemU = StemC Set RateC = RateD Set StemC =StemD Set RateD = next lowerr rate in list or zero if none Set StemD =Stem + “/” + item in rate list corresponding to this rate Go to Step J1A

[0058] The user control is implemented by the user being offered on thescreen the following options which he can select using the keyboard orother input device such as a mouse:

[0059] a) Start: implement the numbered steps given above, from step 4.Whether, when a recording is first selected, it begins to playautomatically, or requires a Start instruction from the user, isoptional; indeed, if desired, the choice may be made by means of anadditional “autoplay” parameter in the link file.

[0060] b) Pause: implemented by an instruction to the MP3 decoder tosuspend reading data from the buffer;

[0061] c) Resume: implemented by an instruction to the MP3 decoder toresume reading data from the buffer;

[0062] d) Jump: implemented by the user indicating which part of therecording he wishes to jump to—for example by moving a cursor to adesired point on a displayed bar representing the total duration of therecording; the player then determines that this point is x % along thebar and calculates the number of the next sub-file needed, which is thenused for the next request. In the Bach example with 125 sub-files then arequest to play from a point 20% into the recording would result in arequest for the 26th sub-file—i.e. 000025.bin. It will be apparent thatthis calculation is considerably simplified if each sub-file correspondsto the same fixed duration. We prefer, in the case of the jump, tosuspend decoding and clear the buffer so that the new request is sentimmediately, but this is not actually essential.

[0063] It is of interest to discuss further the process of partitioningthe original file into sub-files. First, it should be noted that if (asin the first version described above), there is no expectation that asub-file will be followed by a sub-file other than that whichimmediately follows it in the original sequence, then it matters littlewhere the boundaries between the sub-files are located. In that case thesub-file size can be a fixed number of bits, or a fixed playing timelength (or neither of these)—the only real decision is how big thesub-files should be. Where jumps are envisaged (in time, or betweendifferent data rates) there are other considerations. Where, as withmany types of speech or audio coding (including MP3), the signal iscoded in frames, a sub-file should contain a whole number of frames. Inthe case of rate switching, it is, if not actually essential, highlydesirable that the sub-file boundaries are the same for each rate, sothat the first sub-file received for a new rate continues from the samepoint in the recording that the last sub-file at the old rate ended. Toarrange that every sub-file should represent the same fixed time period(e.g. the 4 seconds mentioned above) is not the only way of achievingthis, but it is certainly the most convenient. Note however that,depending on the coding system in use, the requirement that a sub-fileshould contain a whole number of frames may mean that the playingduration of the sub-files does vary slightly. Note that in thisembodiment of the invention, the available data rates, though they usedifferent degrees of quantisation, and differ as to whether they encodein mono or stereo, all use the same audio sampling rate and inconsequence the same frame size. Issues that need to be addressed whendiffering frame sizes are used are discussed below.

[0064] As for the actual sub-file length, excessively short sub-filesshould preferably be avoided because (a) they create extra networktraffic in the form of more requests, and (b) on certain types of packetnetworks—including IP networks—they are wasteful in that they have to beconveyed by smaller packets so that overhead represented by therequesting process and the packet header is proportionately greater. Onthe other hand, excessively large sub-files are disadvantageous inrequiring a larger buffer and in causing extra delay a when startingplay and/or when jumps or rate changes are invoked. A sub-file size ofbetween 30% and 130% of the playout time, or preferably around half theplayout time (as in the examples given above), is found to besatisfactory.

[0065] The actual process of converting the sub-files can be implementedby means of a computer programmed in accordance with the criteriadiscussed. Probably it will be convenient to do this on a separatecomputer, from which the sub-files can be uploaded to the server.

[0066] Another refinement that can be added is to substitute a morecomplex sub-file naming convention so as to increase security by makingit more difficult for an unauthorised person to copy the sub-files andoffer them on another server. One example is to generate the filenamesusing a pseudo-random sequence generator, e.g. producing filenames ofthe form:

[0067] 01302546134643677534543134.bin

[0068] 94543452345434533452134565.bin

[0069] . . .

[0070] In this case the player program would include an identicalpseudo-random sequence generator. The server sends the first filename,or a “seed” of perhaps four digits, and the generator in the player canthen synchronise its generator and generate the required sub-file namesin the correct sequence.

[0071] In the above example of rate-switching, all the data rates usedhad the same frame size, specifically they used MP3 coding of PCM audiosampled at 11.025 KHz and a (PCM) frame size of 1152 samples. If it isdesired to accomplish rate switching between MP3 (or other) recordingshaving different frame sizes, problems arise due to the requirement thata sub-file should contain a whole number of frames, because the frameboundaries do not then coincide. This problem can be solved by thefollowing modified procedure for creating the sub-files. It should benoted particularly that this procedure can be used in any situationwhere rate switching is required and is not limited to the particularmethod of delivery discussed above.

[0072]FIG. 4 shows diagrammatically a sequence of audio samples, uponwhich successive four-second segments are delineated by boundary marks(in the figure) B1, B2 etc. At 11.025 KHz, there are 44,100 samples ineach segment.

[0073] 1. Encode the audio, starting at boundary B1, frame by frame, tocreate an MP3 sub-file, continuing until a whole number of frames havinga total duration of at least four seconds has been encoded. With a framesize of 1152 samples, four seconds corresponds to 38.3 frames, so asub-file S1 representing 39 frames will actually be encoded,representing a total duration of 4.075 seconds.

[0074] 2. Encode the audio, in the same manner, starting at boundary B2.

[0075] 3. Repeat, starting each time at a 4-second boundary, so that inthis way a set of overlapping sub-files is generated for the whole audiosequence to be coded. The last segment (which may well be shorter thanfour seconds) has of course nothing following it, and is padded withzeroes (i.e. silence).

[0076] Coding of the other data rates using different frame sizesproceeds in the same manner.

[0077] At the terminal, the control mechanisms are unchanged, but thedecoding and buffering process is modified:

[0078] 1. Receive sub-file S1;

[0079] 2. Decode sub-file S1;

[0080] 3. Write into the buffer only the first four seconds of thedecoded audio samples (discard the remainder);

[0081] 4. Receive sub-file S2;

[0082] 5. Decode sub-file S2;

[0083] 6. Write into the buffer only the first four seconds of thedecoded audio samples;

[0084] 7. Continue with sub-file S3 etc.

[0085] In this way, it is ensured that the sub-file sets for all rateshave sub-file boundaries which correspond at the same points in theoriginal PCM sample sequence.

[0086] Thus, each four-second period except the last is, prior toencoding, “padded” with audio samples from the next four-second periodso as to bring the sub-file size up to a whole number of MP3 frames. Ifdesired, the padding samples could be taken from the end of thepreceding four-second period instead of (or as well as) the beginning ofthe following one.

[0087] Note that the MP3 standard allows (by a scheme known as “bitreservoir”) certain information to be carried over from one audio frameto another. In the present context, while this is acceptable within asub-file, it is not acceptable between sub-files. However, sincenaturally the standard does not allow such carry-over at the end orbeginning of a recording, this problem is easily solved by encoding eachsub-file separately, as if it were a single recording.

[0088] Changes of sampling rate (and indeed switching between mono andstereo operation) have some practical implications for operation of theaudio interface 36. Many conventional sound cards, although capable ofoperation at a range of different settings, require re-setting in orderto change sampling rate, and necessarily this causes an interruption inits audio output. Thus in a further modification, we propose that thesound card could be run continuously at the highest sampling rateenvisaged. When the player program is supplying, to the buffer, data ata lower sampling rate, this data is then up-sampled to this highest ratebefore or after the buffer. Similarly, if the card is always operated instereo mode, decoded mono signals can be fed in parallel so feed boththe left and right channels of the sound card input. Again, if thenumber of bits per sample of the decoded signal is lower than expectedby the card, the number of bits can be increased by padding with zeros.

[0089] Recollecting that the criteria discussed earlier for automaticdata rate switching downwards envisaged a rate reduction only in casesof buffer underflow (involving therefore interruptions in the output),we note that with this modification such interruption can be avoided andtherefore a criterion which anticipates underflow and avoids it in themajority of cases. In this case the first of the three AND conditionsmentioned above (namely, that the buffer is empty) would be omitted.

[0090] The same principle may be applied to the delivery of videorecordings, or of course, video recordings with an accompanying soundtrack. In the simpler version, where there is only one recording, thesystem differs from the audio version only in that the file is a videofile (e.g. in H.261 or MPEG format) and the player program incorporatesa video decoder. The manner of partitioning the file into sub-files isunchanged.

[0091] As in the audio case, there may be two or more recordingscorresponding to different data rates, selected by the control mechanismalready described. Also one can provide additional recordingscorresponding to different replay modes such as fast forward or fastreverse which can be selected by an extension of the user controlfacilities already described. Again, a systematic convention for fileand directory naming can be followed so that the player program canrespond to—for example—a fast forward command by amending the sub-fileaddress.

[0092] The delivery of video recordings does however have furtherimplications for file partitioning if switching or jumps are to bepermitted. In the case of recordings where each frame of a picture iscoded independently, it is sufficient that a sub-file contains a wholenumber of frames of a picture. If compression involving inter-frametechniques is in use, however, the situation is more complex. Some suchsystems (for example the MPEG standards) generate a mixture ofindependently coded frames (“intra-frames”) and predictively codedframes; in this case each sub-file should preferably begin with anintra-frame.

[0093] In the case of inter-frame coding systems such as the ITU H.261standard, which do not provide for the frequent, regular inclusion ofintra-frames, this is not possible. This is because—takingrate-switching as an example, if one were to request sub-file n of ahigher bit rate recording followed by sub-file n+1 of a lower bit-raterecording, the first frame of the lower bit-rate sub-file would havebeen coded on an inter-frame basis using the last decoded frame ofsub-file n of the lower rate recording, which of course the terminaldoes not have at its disposal—it has the last decoded frame of sub-filen of the higher rate recording. Thus serious mistracking of the decoderwould occur.

[0094] In the case of switching between normal play and a fast playmode, the situation is in practice slightly different. On fast forwardplay at, for example, 5 times normal speed, one encodes only every 5thframe. In consequence the inter-frame correlation is much reduced andinter-frame coding becomes unattractive, so one would generally preferto encode a fast play sequence as intra-frames. Switching from normal tofast then presents no problem, as the intra-frames can be decodedwithout difficulty. However, when reverting to normal play, themistracking problem again occurs because the terminal is then presentedwith a predictively coded frame for which it does not have the precedingframe.

[0095] In either case the problem can be solved by using the principledescribed in our international patent application No. WO098/26604(issued in USA as U.S. Pat. No. 6,002,440). This involves the encodingof an intermediate sequence of frames which bridges the gap between thelast frame of the preceding sequence and the first frame of the newsequence.

[0096] The operation of this will now be described in the context offast forward operation (fast rewind being similar but in reverse). Inthis example we assume that a 9 minute video sequence has been encodedat 96 kbit/s according to the H.261 standard, and again at 5 timesnormal rate entirely at H.261 infra-frames, and that the resulting fileshave each been partitioned into four-second sub-files. Here, fourseconds refers to the duration of the original video signal, not to thefast forward playing time. Following a naming convention similar to thatemployed above, the sub-files might be: Directory Subdirectory Filenamempg_name 096k_x1 000000.bin 000001.bin . . . 000134.bin 096k_x5000000.bin . . . 000134.bin

[0097] To switch from normal play to fast forward is only necessary forthe player program to modify the sub-file address to point to the fastforward sequence—e.g.

[0098] Request mpg_name/096k_(—)×1/000055.bin is followed by

[0099] Request mpg_name/096K_(—)×5/000056.bin

[0100] In order to construct the bridging sequences for switching backto normal play it is necessary to construct a bridging sub-file for eachpossible transition. As described in our international patentapplication mentioned above, a sequence of three or four frames isgenerally sufficient for bridging, so a simple method of implementationis to construct bridging sub-files of only 4 frames duration—e.g.Directory Subdirectory Filename mpg_name 096K_5 > 1 0000001.bin . . .000133.bin

[0101] So that the switching is accomplished by a series of requestssuch as:

[0102] Request mpg_name/096k_(—)×5/000099.bin

[0103] Request mpg_name/096k_(—)>1/000099.bin

[0104] Request mpg_name/096k_(—)×1/000100.bin

[0105] The bridging sub-file is generated as follows:

[0106] Decode the fast forward sequence to obtain a decoded version ofthe last frame of sub-file 99, (at 25 frames per second this will beframe 100,000 of the original video signal).

[0107] Decode the normal sequence to obtain a decoded version of thefirst frame of sub-file 100 (i.e. frame 100,001). Re-encode this oneframe four times using H.261 inter-frame coding based on the decodedframe 100,000 as the initial reference frame.

[0108] Thus, when the decoder has decoded the fast forward sub-file,followed by the bridging sub-file it will have reconstructed frame100,000 correctly and will be ready to decode the normal (×1) frames.Incidentally, the reason that one encodes the same frame several timesin this procedure is that doing so merely once, would produce poorpicture quality due to the quantisation characteristics of H.261.

[0109] Exactly the same process could be used for rate-switching (albeitthat now bridging sub-files are required in both directions). However,it will be observed that, as described, the bridging sub-file results ina freezing of the picture for a period of four frames—i.e. (at 25 framesper second) 160 ms. In switching from fast to normal play this isacceptable—indeed one would probably choose to clear the buffer at thispoint. It may or may not be subjectively acceptable on rate-switching.An alternative therefore would be to construct a four-second bridgingsequence.

[0110] The request series would then look like:

[0111] mpg_name/096k_(—)×1/000099.bin

[0112] mpg_name/096/128_(—)×1/000100.bin

[0113] mpg_name/128k_(—)×1/000101.bin

[0114] The bridging sub-file would in that case be constructed either byrecoding the fifth decoded frame of the decoded 128kbit/s sequence fourtimes starting with decoded 96 kbit/sframe 100,000 as the referenceframe, or coding the first four frames of the decoded 128 kbit/ssequence starting with decoded 96 kbit/s frame 100,000 as the referenceframe. In both cases the remaining 96 frames of the bridging sub-filewould be a copy of the 128 kbit/s sub-file.

[0115] The files to be delivered have been referred to as “recordings”.However, it is not necessary that the entire audio or video sequenceshould have been encoded—or even exist—before delivery is commenced.Thus a computer could be provided to receive a live feed, to code itusing the chosen coding scheme, and generate the sub-files “on the fly”and upload them to the server, so that, once a few sub-files are presenton the server, delivery may commence.

[0116] One application of this delivery system would be for avoice-messaging system, as illustrated in FIG. 5 where the server 1,network 2 and terminal 3 are again shown. A voice-messaging interface 4serves to receive telephone calls, for example via the public switchedtelephone network (PSTN) 5, to record a message, encode it, partition itinto sub-files, and upload them to the server 1, where they can beaccessed in the manner described earlier. Alternatively a secondinterface 6 could be provided, operating in a similar manner to theterminal 3 but controlled remotely via the PSTN by a remote telephone 5,to which the replayed audio signals are then sent.

[0117] The same system can be used for a live audio (or video) feed. Itis in a sense still “recorded”—the difference being primarily thatdelivery and replay commence before recording has finished, althoughnaturally there is an inherent delay in that one must wait until atleast one sub-file has been recorded and loaded onto the server 1.

[0118] The system can proceed as described above, and would be quitesatisfactory except for the fact that replay would start at thebeginning whereas what the user will most probably want is for it tostart now—i.e. with the most recently created sub-file.

[0119] With a lengthy audio sequence one may choose to delete the oldersub-files to save on storage: with a continuous feed (i.e. 24 hours aday) this will be inevitable and moreover one would need to reuse thesub-file names (in our prototype system we use 000000.bin to 009768.binand then start again at 000000.bin), so that the older sub-files areconstantly overwritten with the new ones. A simple method of ensuringdelivery starting with the most recent sub-file would be to include inthe index file an extra command instructing the player program to startby requesting the appropriate sub-file. This however has thedisadvantage that the index file has to be modified veryfrequently—ideally every time a new sub-file is created. Therefore wepropose a method whereby the player program scans the server to find thestarting sub-file, as follows. In the index file, the Mode parameter isset to “live” to trigger the player program to invoke this method. LFIis set to indicate the maximum number of sub-files that may bestored—say 9768. The method involves the following steps and presupposesthat (as is conventional) each sub-file's “last modified” time and datehas been determined. When using the HTTP protocol this can be achievedusing a HEAD request which results not in delivery of the requestedsub-file but only of header information indicating the time that thesub-file was written to the server, or zero if the sub-file does notexist. This time is represented below as GetURL(Livelndex) whereLiveIndex is the sequence number of the sub-file in question. Commentsare preceded by “//”. 1 LFI = 9768 // read from the index.htm fileLiveIndex = LFI / 2 StepSize = LFI / 2 LiveIndexModifiedAt = 0; // thebeginning of time. 10 ThisIndexWasModifiedAt = GetURL(LiveIndex); 20 If(StepSize = 1) [was If (StepSize == 1)] { // LiveIndexModifiedAtcontains the time the file was written or 0 if no file // has beenfound. LiveIndex contains the index. goto 30 [was FINISH] } StepSize =StepSize / 2 if (ThisIndexWasModifiedAt > LiveIndexModifiedAt) {LiveIndexModifiedAt = ThisIndexesModifiedAt; LiveIndex = LiveIndex +StepSize } else { LiveIndex = LiveIndex − StepSize } Goto 10 30 FINISH

[0120] Having found the LiveIndex it is prudent to step back the Tp(playout time) and start to make the requests to fill the audio bufferfrom there. Playing may commence in the normal way.

[0121] Once the recording has actually finished, the index file can ifdesired be modified to set Mode to “recorded”, and any lengthparameters.

[0122] If desired the player program could check periodically to seewhether the index file has changed from “live” to “recorded” mode and ifso to switch to “recorded” mode playing.

[0123] A simpler and much faster method of the identification of the“latest” sub-file will now be described, assuming, first of all, asingle continuous sub-file numbering sequence.

[0124] 1. Terminal issues a HEAD request for the first sub-file (e.g.000000.bin).

[0125] 2. The server replies by sending the header of this file andincludes the date and time the file was last modified (MODTIME) and thedate and time at which this reply was sent (REPLYTIME) (both of theseare standard http. fields).

[0126] 3. The terminal calculates the elapsed time (ELTIME) bysubtracting the two (ELTIME=REPLYTIME—MODTIME), and divides this by theplaying duration of a sub-file (4 seconds, in these examples) to obtainLIVEINDEX=ELTIME/4.

[0127] 4. The terminal calculates the filename of the sub-file havingthis index.

[0128] 5. The terminal issues a HEAD request with this filename and ifnecessary each subsequent filename until it receives zero (file notfound) whereupon it regards the latest sub-file which is found as the“Current sub-file”.

[0129] 6. The terminal begins requesting files, starting at point J1: ofthe flowchart given earlier.

[0130] This method is considerably faster than that described above forthe cyclically numbered sub-files. Note that older sub-files may stillbe deleted, to reduce storage requirement, as long as the startingsub-file is kept. The method can however be modified to accommodatefilename re-use (cyclic addresses), but would require:

[0131] (i) That the starting sub-file name (e.g. 000000.bin) is notre-used so that it is always available to supply the header informationat Step 2. Thus, with wrapping at 009768.bin, sub-file 009768.bin wouldbe followed by sub-file 000001.bin.

[0132] (ii) The calculated LIVEINDEX at Step 3 is taken Modulo 9768(i.e. the remainder when ELTIME/4 is divided by 9768).

[0133] (iii) Sub-file deletion always leads the creation of newsub-files so that a few file-names between the newest sub-file and theoldest undeleted sub-file do not exist, in order that the expected “filenot found” response occurs at Step 5.

[0134] There may be a danger of the playing operation running slightlyfaster or slower than the recording operation. To guard against theformer it may be arranged that the player program checks each sub-fileit receives to ascertain whether it is marked with a later time than theprevious one: if not the sub-file is discarded and repeated requestsmade perhaps three times) followed by a check of the index file if theserequests are unsuccessful.

[0135] If the playing lags behind the recording process this can beidentified by the player program occasionally checking the server forthe existence of a significant number of sub-files more recent thanthose currently being requested, and if such sub-file do exist,initiating a “catching up” process—e.g. by regularly discarding a smallamount of data.

1. A method of encoding audio signals comprising notionally dividing aninput signal into successive temporal portions; encoding said inputtemporal portions using a first encoding algorithm having a first framelength to produce a first encoded sequence of encoded temporal portions;encoding said input temporal portions using a second frame length toproduce a second sequence of encoded temporal portions; wherein at leastone of the encoding steps comprises encoding one input temporal portionalong with so much of the end of the preceding temporal portion and/orthe beginning of the immediately following temporal portion as toconstitute with said one temporal portion an integral number of frames.2. A method according to claim 1 in which the first and second encodingalgorithms correspond to respective different output data rates.
 3. Amethod according to claim 1 or 2 including feeding one sequence ofencoded temporal portions to an input and, in response to a switchingcommand, switching the output to be supplied with the other sequence ofencoded temporal portions, the switching occurring at the boundarybetween one encoded temporat portion and the next.
 4. A method oftransmitting audio signals comprising: encoding the signals using themethod of claim 1, 2 or 3; decoding the discrete portions; anddiscarding that part of the decoded signal which corresponds to said endand/or beginning.
 5. A method of encoding input audio signalscomprising: encoding with a first coding algorithm having a first framelength each of successive first temporal portions of the input signal,which portions correspond to an integral number of said first framelengths and either are contiguous or overlap, to produce a first encodedsequence; encoding with a second coding algorithm having a second framelength each of successive second temporal portions of the input signal,which portions correspond to an integral number of said second framelengths and do not correspond to an integral number of said first framelengths and which overlap, to produce a second encoded sequence suchthat each overlap region of the second encoded sequence encompasses atleast partially a boundary between, or, as the case may be, overlapregion portions of, the first encoded sequence which correspond tosuccessive temporal portions of the input signal.